Reliable Learning of Bernoulli Mixture Models
نویسندگان
چکیده
Abstract In this paper, we have derived a set of sufficient conditions for reliable clustering of data produced by Bernoulli Mixture Models (BMM), when the number of clusters is unknown. A BMM refers to a random binary vector whose components are independent Bernoulli trials with clusterspecific frequencies. The problem of clustering BMM data arises in many real-world applications, most notably in population genetics where researchers aim at inferring the population structure from multilocus genotype data. Our findings stipulate a minimum dataset size and a minimum number of Bernoulli trials (or genotyped loci) per sample, such that the existence of a clustering algorithm with a sufficient accuracy is guaranteed. Moreover, the mathematical intuitions and tools behind our work can help researchers in designing more effective and theoretically-plausible heuristic methods for similar problems.
منابع مشابه
Clustering Binary Data with Bernoulli Mixture Models
Clustering is an unsupervised learning technique that seeks “natural” groupings in data. One form of data that has not been widely studied in the context of clustering is binary data. A rich statistical framework for clustering binary data is the Bernoulli mixture model for which there exists both Bayesian and non-Bayesian approaches. This paper reviews the development and application of Bernou...
متن کاملBYY harmony learning, structural RPCL, and topological self-organizing on mixture models
The Bayesian Ying-Yang (BYY) harmony learning acts as a general statistical learning framework, featured by not only new regularization techniques for parameter learning but also a new mechanism that implements model selection either automatically during parameter learning or via a new class of model selection criteria used after parameter learning. In this paper, further advances on BYY harmon...
متن کاملLearning mixture models – courseware for finite mixture models of multivariate Bernoulli distributions
Teaching of machine learning should aim at the readiness to understand and implement modern machine learning algorithms. Towards this goal, we often have course exercises involving the student to solve a practical machine learning problem involving a reallife data set. The students implement the programs of machine learning methods themselves and gain deep insight on the implementation details ...
متن کاملUsing Regression based Control Limits and Probability Mixture Models for Monitoring Customer Behavior
In order to achieve the maximum flexibility in adaptation to ever changing customer’s expectations in customer relationship management, appropriate measures of customer behavior should be continually monitored. To this end, control charts adjusted for buyer’s/visitor’s prior intention to repurchase or visit again are suitable means taking into account the heterogeneity across customers. In the ...
متن کاملBernoulli Mixture Models for Markov Blanket Filtering and Classification
This paper presents the use of Bernoulli mixture models for Markov blanket filtering and classification of binary data. Bernoulli mixture models can be seen as a tool for partitioning an n-dimensional hypercube, identifying regions of high data density on the corners of the hypercube. Once Bernoulli mixture models are computed from a training dataset we use them for determining the Markov blank...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1710.02101 شماره
صفحات -
تاریخ انتشار 2017